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Abstract 

Evolutionary game theory combines game theory and dynam- 
ical systems and is customarily adopted to describe evolu- 
tionary dynamics in multi-agent systems. In particular, it has 
been proven to be a successful tool to describe multi-agent 
learning dynamics. To the best of our knowledge, we pro- 
vide in this paper the first replicator dynamics applicable to 
the sequence form of an extensive-form game, allowing an 
exponential reduction of time and space w.r.t. the currently 
adopted replicator dynamics for normal form. Furthermore, 
our replicator dynamics is realization equivalent to the stan- 
dard replicator dynamics for normal form. We prove our re- 
sults for both discrete-time and continuous-time cases. Fi- 
nally, we extend standard tools to study the stability of a strat- 
egy profile to our replicator dynamics. 

Introduction 

Game theory provides the most elegant tools to model strate- 
gic interaction situations among rational agents. These situ- 
ations are customarily modeled as games dFTQlJ in which 
the mechanism describes the rules and strategies describe 
the behavior of the agents. Furthermore, game theory pro- 
vides a number of solution concepts. The central one is 
Nash equilibrium. Game theory assumes agents to be ra- 
tional and describes "static" equilibrium states. Evolution- 
ary game theory (ICre03t drops the assumption of rational- 
ity and assumes agents to be adaptive in the attempt to 
describe dynamics of evolving populations. Interestingly, 
there are strict relations between game theory solution con- 
cepts and evolutionary game theory steady states, e.g., Nash 
equilibria are steady states. Evolutionary game theory is 
commonly adopted to study economic evolving popula- 
tions (.CNP07J and artificial multi-agent systems, e.g., for 
describing multi-agent learning dynamics (THV06, TP07I 
IPTLO8I 1 and as heuristics in algorithms (KMTll). In this 
paper, we develop efficient techniques for evolutionary dy- 
namics with extensive-form games. 

Extensive-form games are a very important class of 
games. They provide a richer representation than strategic- 
form games, the sequential structure of decision-making be- 
ing described explicitly and each agent being allowed to 
be free to change her mind as events unfold. The study 
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of extensive-form games is carried out by translating the 
game by means of tabular representations (ISLB08) . The 
most common is the normal form. Its advantage is that all 
the techniques applicable to strategic-form games can be 
adopted also with this representation. However, the size of 
normal form grows exponentially with the size of the game 
tree, thus being impractical. The agent form is an alterna- 
tive representation whose size is linear in the size of the 
game tree, but it makes, even with two agents, each agent's 
best-response problem highly non-linear. To circumvent 
these issues, sequence form was proposed ( lvS96l l. This form 
is linear in the size of the game tree and does not intro- 
duce non-linearities in the best-response problem. On the 
other hand, standard techniques for strategic-form games 
cannot be adopted with such representation, e.g. (,LH64j , 
thus requiring alternative ad hoc techniques, e.g. jLem78l) . 
In addition, sequence form is more expressive than normal 
form. For instance, working with sequence form it is pos- 
sible to find Nash-equilibrium refinements for extensive- 
form games — perfection based Nash equilibria and sequen- 
tial equilibrium dMSIOIlGIIIl l — while it is not possible with 
normal form. 

To the best of our knowledge, there is no result dealing 
with the adoption of evolutionary game theory tools with 
sequence form for the study of extensive-form games, all 
the known results working with the normal form (ICre03) . 
In this paper, we originally explore this topic, providing the 
following main contributions. 

• We show that the standard replicator dynamics for nor- 
mal form cannot be adopted with the sequence form, the 
strategies produced by replication not being well-defined 
sequence-form strategies. 

• We design an ad hoc version of the discrete-time repli- 
cator dynamics for sequence form and we show that it is 
sound, the strategies produced by replication being well- 
defined sequence-form strategies. 

• We show that our replicator dynamics is realization equiv- 
alent to the standard discrete-time replicator dynamics for 
normal form and therefore that the two replicator dynam- 
ics evolve in the same way. 

• We extend our discrete-time replicator dynamics to the 
continuous-time case, showing that the same properties 
are satisfied and extending standard tools to study the sta- 
bility of the strategies to our replicator. 



Game theoretical preliminaries 

Extensive-form game definition. A perfect- 
information extensive-form game tFT91| l is a tuple 

{N, A, V, T, L, p, X, u), where: N is the set of agents (i e N 
denotes a generic agent), A is the set of actions (Ai £ A 
denotes the set of actions of agent i and a e A denotes 
a generic action), V is the set of decision nodes (Vi £ V 
denotes the set of decision nodes of i), T is the set of 
terminal nodes {w e V VJ T denotes a generic node and 
Wo is root node), b •■ V ^ N returns the agent that acts at 
a given decision node, p : V ^ p(^) returns the actions 
available to agent i{w) atw, x:yxj4-^yuT assigns the 
next (decision or terminal) node to each pair (tu,a) where 
a is available at w, and u = (ui, . . . ,U|7v|) is the set of 
agents' utility functions Ui : T ^ M.. Games with imperfect 
information extend those with perfect information, allowing 
one to capture situations in which some agents cannot 
observe some actions undertaken by other agents. We 
denote by Vi,h the /i-th information set of agent i. An 
information set is a set of decision nodes such that when an 
agent plays at one of such nodes she cannot distinguish the 
node in which she is playing. For the sake of simplicity, we 
assume that every information set has a different index h, 
thus we can univocally identify an information set by h. 
Furthermore, since the available actions at all nodes w 
belonging to the same information set h are the same, 
with abuse of notation, we write p{h) in place of p{'w) 
with w e Vi^h- An imperfect-information game is a tuple 
(TV, A, V, T, 'i, p, X, u, H) where (N, A, V, T, l, p, x, u) is a 
perfect-information game and H = {Hi,. . . , H\jq\ ) induces 
a partition Vi = [JheHi Vi.h such that for all w, w' e Vi^h we 
have p{w) = p{w'). We focus on games W\l\\ perfect recall 
where each agent recalls all the own previous actions and 
the ones of the opponents ( |FT91| l. 




Figure 1: Example of two-agent perfect-information 
extensive-form game, x.y denote the y-th node of agent x. 



(Reduced) Normal form ( |vNM44) . It is a tabular rep- 
resentation in which each normal-form action, called plan 
and denoted hy p e Pi where Pi is the set of plans of agent i, 
specifies one action a per information set. We denote by tt^ 
a normal-form strategy of agent i and by tt; (p) the probabil- 
ity associated with plan p. The number of plans (and there- 
fore the size of the normal form) is exponential in the size 
of the game tree. The reduced normal form is obtained from 
the normal form by deleting replicated strategies ( VJ98t . Al- 
though reduced normal form can be much smaller than nor- 
mal form, it is exponential in the size of the game tree. 
Example 1 The reduced normal form of the game in Fig. |7] 
and a pair of normal-form strategies are: 



agent 2 



U* 



R1L2L3 



R1L2R3 



R1R2L3 



Ki Ko K3 



3,1 



3,1 



3,3 



3,3 



2, 1 



'^l.Li. 


- 3 








'''■l,RiL2L3 
7ri,RiL2R3 


= 

1 

3 


■n-2 = 


"■2,1 

^2,r 


= 1 

= 


7ri,RiR2L3 


= U 








'^1,R-^R2^^ 


1 
- 3 









Agent form ( Kuh50l:[Sel75l l. It is a tabular representation 
in which each agent is replicated in a number of fictitious 
agents, each per information set, and all the fictitious agents 
of the same agent have the same utility. A strategy is com- 
monly said behavioral and denoted by <Ji. We denote by 
(Ti{a) the probability associated with action a e Ai. The 
agent form is linear in the size of the game tree. 

Sequence form (vS96l. It is a representation constituted 
by a tabular and a set of constraints. Sequence-form actions 
are called sequences. A sequence q € Qi of agent « is a set 
of consecutive actions a e Ai where Qi c Q is the set of 
sequences of agent i and Q is the set of all the sequences. 
A sequence can be terminal, if, combined with some se- 
quence of the opponents, it leads to a terminal node, or non- 
terminal otherwise. In addition, the initial sequence of every 
agent, denoted by q0, is said empty sequence and, given se- 
quence q & Qi leading to some information set h e Hi, we 
say that q' extends q and we denote by q' = q\a if the last 
action of q' (denoted by a{q') = a') is some action a e p{h) 
and q leads to h. We denote by it; = h{q) the node w with 
a{q) € p{w); hy q' £ q a subsequence of q; by x^ the 
sequence-form strategy of agent i and by Xi{q) the probabil- 
ity associated with sequence q e Qi. Finally, condition q ^ h 
is true if sequence q crosses information set h. Well-defined 
strategies are such that, for every information set h e Hi, the 
probability Xi{q) assigned to the sequence q leading to h 
is equal to the sum of the probabilities Xi{q')s where q' 
extends q at h. Sequence form constraints are Xi(q0) = 1 
and Xi{q) = "Eaepiw) Xi{q\a) for every sequence q, action a, 
node w such that w = h{q\a), and for every agent i. The 
agent i's utility is represented as a sparse multi-dimensional 
array, denoted, with an abuse of notation, by Ui, specifying 
the value associated with every combination of terminal se- 
quences of all the agents. The size of the sequence form is 
linear in the size of the game tree. 

Example 2 The sequence form of the game in Fig. \l\and a 
pair of sequence-form strategies are: 
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Replicator dynamics. The standard discrete-time repli- 
cator equation with two agents is (iCre03l l: 



eI'-C/i-7r2(t) 
7ri(p, t+1) =7ri(p, *)■ ^^ 



7r2(p, t+ 1) = ir2 (p,t) • -^ 



Jit)-Ui-7T2{t) 
f(t)-C/2-ep 



Trf (t)-C/2--n-2(i) 



(1) 
(2) 



while the continuous-time one is 



Algorithm 1 generate_gq(xi(i)) 



•!Tl(p) = 7ri(p) ■ [(Sp -TTi) ■Ul-Tr2] 
*2(P) = T2(p) ■ [tTj ■ U2 ■ (Sp - 772)] 



(3) 
(4) 



where ep is the vector in which the p-th component is "1" 
and the others are "0". 

Discrete-time replicator dynamics for 
sequence-form representation 

Initially, we show that the standard discrete-time replicator 
dynamics for normal form cannot be directly applied when 
sequence form is adopted. Standard replicator dynamics ap- 
plied to the sequence form is easily obtained by considering 
each sequence g as a plan p and thus substituting e^ to Bp 
in ([T]i-@ where e^ is zero for all the components q' such 
that q' + q and one for the component q' such that q' = q. 

Propositions The replicator (HJ-fO does not satisfy the 
sequence-form constraints. 

Proof. The proof is by counterexample. Consider xi(i) 
and X2(t) equal to the strategies used in Example|2] At time 
t+ \ the strategy profile generated by ([T}-(|2) is: 



<{t+i) = \ 



i ] x^(t+l) = [ i 



that does not satisfy the sequence-form constraints, e.g., 
a;i(q0,i + 1) * 1 for alH. n 

The critical issue behind the failure of the standard repli- 
cator dynamics lies in the definition of vector e^. Now we 
describe how the standard discrete-time replicator dynam- 
ics can be modified to be applied to the sequence form. In 
our variation, we substitute e, with an opportune vector g^ 
that depends on the strategy Xj(t) and it is generated as de- 
scribed in Algorithm [T] obtaining: 



ri(9, t + 1) = £Ci(g, t) 
X2{q, t + 1) = X2{q, t) 



g;f(xi(t)).c/i.x2(t) 

xf(t)-C/l-X2(t) 



(5) 



(6) 



<.l(t)-U2-^2(.t) 

The basic idea behind the construction of vector gq is: 

• assigning "1" to the probability of all the sequences con- 
tained in q, 

• normalizing the probability of the sequences extending 
the contained in q, 

• assigning "0" to the probability of all the other sequences. 

We describe the generation of vector gq(xi(i)), for clarity 
we use as running example the generation of gR^fi^{xi{t)) 
related to Example |2] 

• all the components of gg(x,;(t)) are initialized equal to 

"0", e.g., 

gRlR3(xi(t))'^ = [ 0000000] 

• if sequence q is played, the algorithm assigns: 

- "1" to all the components gq{q',Xi{t)) of gq{xi{t)) 
where q' £ q (i.e., q' is a subsequence of q), e.g.. 



gRlR3(xi(t)) 







g,(x,(t)) = 
itxi{q,t) * Othen 

for q' e Qi s.t. q' <^ q do 
gg(q',Xi(t)) = 1 
for q" e Qi s.t. q" n q = q and q" = q\a\ 

/ " / 4.\\ a:i(q",t) 

return g,(xi(t)) 



. : a e p{h), q -/> h do 



^iil' yi) 



- " %\\f'^t) " ^° ^11 *^ components gq{q" ,y.i{t)) of 
gq(xi(/;)) where q' £ q with q' = q" n q and se- 
quence q" is defined as q" = q'\a\ . . . with a e p{h) 
and q -^ h (i.e., q' is a subsequence of q and q" extends 
q' off the path identified by q), e.g., 

gRjR3(Xl(t))^ = [ 1 1 i i 1 ] 

- all the other components are left equal to "0", 

• if sequence q is not played, gq(xi(i)) can be arbitrary, 
since the q-th equation of (|5])-(|6| is always zero given 
that Xi{q, t) =0 for every t. 

All the vectors gq(xi(i)) of Ex ample |2] are: 



^0 

Li 

Ri 

R1L2 
R1R2 
R1L3 
R1R3 

We show that replicator dynamics dSj-® do not violate 
sequence-form constraints. 

Theorem 4 Given a well-defined sequence-form strategy 
profile (xi(i),X2(t)), the output strategy profile (xi(i + 
l),X2(i + 1)) of replicator dynamics (O-dSI satisfies 
sequence-form constraints. 

Proof. The constraints forced by sequence form are: 

• Xi(q0, i) = 1 for every i, 

• Xi{q,t) = T,asp(w) Xi{Q\a,t) for every sequence q, ac- 
tion a, node w such that w = h{q\a), and for every agent i. 
Assume, by hypothesis of the theorem, that the above con- 
straints are satisfied at t, we need to prove that constraints 

2:,(q0,t+l) =1 (7) 

Xi(q,t+1)= Y, Xi{q\a,t+1) (8) 

are satisfied. Constraint (|2) always holds because 
gq^(xi(t)) = xi(t). We rewrite constraints dSJ as 



Sq0 


gLl 


gfll 


gi?lL2 


SR1R2 


SRIL3 


gHli?3 


1 


1 


1 


1 


1 


1 


1 


1 
3 


1 

















2 
3 





1 


1 


1 


1 


1 


1 
3 
1 
3 






1 

2 

1 

2 


1 






1 


1 

2 

1 

2 


1 

2 

1 

2 

















1 





2 
3 





1 


1 


1 





1 



Xi(q,t) 



g^(x,(t)).t/i-x_.(t) 

xf(t)-;7, -x-iCt) 






t) 



gi^a(x.(t))-C/i-X-,(t)^ 



(9) 



^J{t)-Ui-yi-i(t) 

Conditions (|9]l hold if the following condition holds 

x,(q,t)-%l(Mt))= Z (^.(?l'^^*)-g5a(x.(i))) (10) 

aep{ui) 

Notice that condition ( flOl i is a vector of equalities, one per 
sequence q'. Condition ( fTOl i is trivially satisfied for compo- 
nents q' such that gq{q' ,Xi{t)) = 0. To prove the condition 
for all the other components, we introduce two lemmas. 



Lemma 5 Constraint ( 1701 ) holds for all components 
gq{q',Xi{t)) of gq{j<-i{t)) such thatq' g q. 

Proof. By construction, gq{q' , ^i{t)) = 1 for every q' £ q. 
For every extension q\a of q, we have that q' <£ q c q\a. For 
this reason 5^|£i((j',Xi(t)) = 1. Thus 



Xi{q,t) -3,(1}', Xi(t)) = 
Xi{q,t) ■ 1 = 



Y, {xi{q\a,t)-g^^^(q',Xi{t))) iff 

Y, Xi(q\a,t)-1 



that holds by hypothesis. Therefore the lemma is proved, n 
Lemma 6 Constraint UOJ holds far all components 
gq{q" ,yLi{t)) of gq{:x.i{t)) where q' £ q with q' = q" n q 
and sequence q" = q'\a'\ . . . with a' e p{h) and q ^ h. 

Proof For all q", gq{q" ,yLi{t)) = ^^^^t^ by construc- 
tion. In the right side term of (fTol i. for all a we can have 
either q\a <f. q" or q\a c q". In the former we have that 

9q\a{<l" ,^iit)) = ^x (' 'tl ^ ^^ ^^ latter there exists only one 

action a such that gq\a{q" ^'^i{t)) = ^.V la tv ^^^^^ ^^^ '■^^ 
other actions a* the value of g^i^, (q", Xi(i)) is zero. Hence, 
we can have two cases: if q\a t- q" , then 

^(t))) 



iff 



^i{q,t)-gq{q" -.^iii)) = Z! {^i{q\o.,'t)-aq\a{q" 

aeplin,) 

that holds by hypothesis, otherwise if q\a c q" , then 

Xi{q,t)-gq{q" -.^iii)) = Y. {^i('l\"':t)-9qla(q":^i(t))) ^ 

aepi^-) 

-. Xi(q",t) , , ^ Xi(q",t) 

Xiiq-.t)- V \^ =Xi{q\a,t)- " ' 



i{q^t) Xi{q\a. t) 

that always holds. Therefore the lemma is proved. n 

From the application of Lemmas |5] and |6] it follows that 
condition (fTOl l holds. n 

Replicator dynamics realization equivalence 

There is a well-known relation, based on the concept 
of realization, between normal-form and sequence-form 
strategies. In order to exploit it, we introduce two results 
from (lKMv96l ). 

Definition 7 (Realization equivalent) Two strategies of an 
agent are realization equivalent if for any fixed strategies of 
the other agents, both strategies define the same probabili- 
ties for reaching the nodes of the game tree. 

Proposition 8 For an agent with perfect recall, any 
normal-form strategy is realization equivalent to a 
sequence-form strategy. 

We recall in addition that each pure sequence-form strat- 
egy corresponds to a pure normal-form strategy in the re- 
duced normal form (|KMv96l ). We can show that the evo- 
lutionary dynamics of (|5]l-(|6ll are realization equivalent to 
the evolutionary dynamics of the normal-form replicator dy- 
namics and therefore that the two replicator dynamics evolve 
in the same way. 

Initially, we introduce the following lemma that we will 
exploit to prove the main result. 
Lemma 9 Given 
• a reduced-normal-form strategy TVi{t) of agent i, 



• a sequence-form strategy Xi{t) realization equivalent to 

7r,(t), 
it holds that Xi{q\a, t) ■ g^,{xi{t)) is realization equivalent 

to T,peP:aep (^iCP:^) ' ^p) faf ^^^ a E Ai and q i Qi with 
q\a e Qi. 

Proof. We denote by Xp(t) the sequence-form strategy 
realization equivalent to ep(i). According to ( |KMv96l ). we 
can rewrite the thesis of the theorem as 

x^(qW,t)■sl^A^i(t))= Y. {^i{P,t)-iip{tf) VaeAi (11) 

Notice that, for each action a and sequence q such that q\a e 
Qi, condition ( fTTT i is a vector of equality conditions. Given 
a and q, two cases are possible: 

1. Xi{q\a,t) = and then Yp^p-.a^pT^iiP^i) = 0, thus condi- 
tions ([n]i hold; 

2. Xi{q\a,t) + 0, in this case: 

• for all components gq\a{q' ,y^i{t)) of gq|a(xj(i)) and 
Xpiq'jt) of ip{t) such that q' c q\a, we have that 
Xp{q',t) = 1 for all p € P with a e p and Algorithm 1 
sets gq\a{q', Xi(i)) = 1, thus we can rewrite (fTTT i as 

2:i(?|a,t) ■9,|a(«',Xi(i)) = Y ('^^(P^ *)■ *?(«'. *)) iff 

p^P'.aep 

Xi{q\a,t)-1= Y (^^{P.t)■l) 

p^P:a^p 

that holds by hypothesis and thus conditions (fTTT i hold; 

• for all components gq\a{q"i^i{t)) of Sq\a{^i{t)) and 
Xp{q",t) of Xp(t) such that q" such that q" n q = 
q' and sequence q" = q'\a'\... with a' e p{h) and 
q -/> h, we have that Xp{q",t) = 1 for all p € P with 
a,a{q") e p and "0" otherwise, and Algorithm 1 sets 

gq\a{q" ,y^i{t)) = 1,(q''ly ^hus wc Can rcwritc ([TT) as 

Xi{q\a,t)-g^la{q",:x.i{t)) = Y {'^i(P^t)-Xp(q",t)) iff 

p^Pia^p 

Xi{q\a., t) , , . = 2. (7r,(p, t) ■ 1) 

Xi( a .t) J-, { ,,\ 

i-y-i ■ I peP:a,a{q" )^p 

Using the relationship with the behavioral strategies, 
we can write 

, I ,. x^{q"■^) Yi I ' *\ no'.,"'Ti(a',t) 

Being q' £ q\a and q' £ q" we have 

^{q",t) 



Xi{q\a, t) 



Xi(q',t) 
Y[ rTi{a',t)- Y[ <yi{a,t)- ]~[ rTi{a' ,t) ■- 

6Q|a\q' a'tiq' a'e^q"\q' 



n 



(7i{a , t) 



a'<E{a, a(<j")} 



5(a') 



that can be easily rewrite as — for details (?)- 



Y Mp-^)^ 



peP:a,a(g")ep 



n 



'E{a,a(,")} 



<!(»') 



ai(a .t) 



and therefore conditions (fTTt hold. 



This completes the proof of the lemma. n 

Now we state the main result. It allows us to study the 
evolution of a strategy in a game directly in sequence form, 
instead of using the normal form, and it guarantees that the 
two dynamics (sequence and normal) are equivalent. 



Theorem 10 Given 

• a normal-form strategy profile (7ri(t),7r2(t)) and its 
evolution {T^i{t + 1), 7r2(i + 1)) according to (GJ-dU, 

• a sequence-form strategy profile (xi(i),X2(i)) and its 
evolution (xi(i + l),X2(t+ 1)) according to Q-©, 

if {ivi(t),iV2{t)) and (xi(t),X2(i)) are realization equiv- 
alent, then also (7ri(t+l), 7r2(i+l)) and (xi(i+l),X2(t + 
1)) are realization equivalent. 

Proof. Assume, by hypothesis of the theorem, that 
(xi(t),X2(t)) is realization equivalent to (7ri(i),7r2(i)). 
Thus, according to ( |KMv96l ). for every agent i it holds 

Xi{q\a,t)= Yj ^i(.P't) VaeAi 

We need to prove that the following conditions hold: 

Xi{q\a,t+1) = Y, ^i(P. * + l) VaeAi (12) 

By applying the definition of replicator dynamics, we can 
rewrite the conditions (fT2t as: 



Xi(q\a, t) 



gSa(^'(*))-^--^-'(*) 
xf (t) ■ U^ ■ X-i(t) 



^ / eJ^.C/,-7r_.(t) \ 

p,ka.p\ ^ ' ^T{t)-U.--K-,(t)j 

Given that, by hypothesis, xf{t) ■ Ui • x_i(i) = Tvf{t) ■ Ui • 
7r_i(i), we can rewrite conditions (fTJt as: 

= E ('ri(p,t)-eJ-C/i-7r_.(i)) ^aeA, 

p^F:aep 

These conditions hold if and only if Y,peP:aep (^j (P: *) ' ^J) 
is realization equivalent to Xi{q\a,t) ■ g'^:{xi{t)). By 
Lemma|9] this equivalence holds. n 

Continuous-time replicator dynamics for 
sequence-form representation 

The sequence-form continuous-time replicator equation is 

ii(g,t)=xi(g,t)-[(g9(xi(t))-xi(t))^-;7i-X2(t)] (14) 

X2{q, t) = X2(q, t) ■ [xi(t)^ ■ t/2 ■ (g5(x2(t) - X2(t))] (15) 

Theorem 11 Given a well-defined sequence-form strategy 
profile (xi(i),X2(t)), the output strategy profile (xi(t + 
Ai),X2(t + Ai)) of replicator dynamics ( I74l )-d75l) satisfies 
sequence-form constraints. 

Proof. The constraints forced by sequence form are: 

• a;i(q0, t) = 1 for every i, 

• Xi{q,t) = T,aep{w)Xi{q\a,t) for every sequence g, ac- 
tion a, node w such that w = h{q\a), and for every agent i. 

Assume, by hypothesis of the theorem, that constraints are 
satisfied at a given time point t, we need to prove that con- 
straints 

Xi(q0,t + At) = 1 (16) 

Xi(q,t + At) = Y Xi{q\a,t + At) (17) 

aep^^u) 



are satisfied. Constraint (fTSI l always holds because 
gg(xi(t)) = xi(i). We rewrite constraints (flTl l as 

x,{q, t) ■ [(g,(x,(t)) - x,(t))^ ■ [/, ■ x_.(t)] = 

= E (2;,(?|a)-[(g,|„(x,(t))-x,(t))^-C/, ■x_,(t)]) (18) 

aep(uO 

Conditions (fTSl l hold if the following conditions hold 

^.(■7,*)-g<^(x,(i))= Z (^.(?l''.*)-gi^Jx,(t))) (19) 

Notice that condition ( fT9] l is a vector of equalities. The above 
condition is trivially satisfied for components q' such that 
gq{q' ,Xi{t)) = 0. From the application of Lemmas |5] and |6] 
the condition ( fT9] l holds also for all the other components, n 

Theorem 12 Given 

• a normal-form strategy profile (7ri(i), 7r2(i)) and its 
evolution (7ri(i + Ai), 7r2(i + Ai)) according to ©-(HI, 

• fl sequence-form strategy profile (xi(i),X2(t)) one/ /fi 
evolution (xi(t + Ai),X2(i + At)) according to ( 1741 1- 

(123, 

//'(7ri(i), 7r2(t)) one/ (xi(i),X2(i)) are realization equiv- 
alent, then also (7ri(i + At),7r2(i + Ai)) one/ (xi(i + 
At) , X2 (i + At) ) are realization equivalent. 

Proof. Assume, by hypothesis of the theorem, that 
(xi(t),X2(t)) is realization equivalent to (7ri(t),7r2(t)). 
Thus, according to ( KMv96 ), for every agent i it holds 



Xi{q\a,t)= E ^iiPyt) 

peP:aep 



Vae Ai 



We need to prove that the following conditions hold: 

Xi{q\a,t + At) = E T^iip.t + At) VaeAi (20) 

By applying the definition of replicator dynamics, we can 
rewrite the conditions (|20] | as: 

:r»(g|Q, t) ■ [(g,|„(x,(t)) - x,(t))^ ■ u, ■ x_,(t)] = 
= E (Ti(p,t)-[(ep-7r,(t))^-C/, ■7r_,(t)]) VaeAi (21) 

Given that, by hypothesis, xf (t) • Ui • x_i(t) = tt^ (t) • Ui ■ 
7r_i(t), we can rewrite conditions (I2TI 1 as: 

^i(9|Q>t)-g5a(x>(t))-C/i-X_,(t) = 

= E (^i(p,t)-ej-t/, -Tr-.Ct)) VaeA, 

These conditions hold if and only if YpaP-.aep (""iCfi t) ' gJ) 
is realization equivalent to Xi{q\a,t) ■ g"^^(xi(t)). By 
Lemma|9] this equivalence holds. n 

Analyzing the stability of a strategy profile 

We focus on characterizing a strategy profile in terms of evo- 
lutionary stability. When the continuous-time replicator dy- 
namics for normal-form is adopted, evolutionary stability 
can be analyzed by studying the eigenvalues of the Jacobian 
in that point ( IAP92I ) — non-positiveness of the eigenvalues is 
a necessary condition for asymptotical stability, while strict 
negativeness of the eigenvalues is sufficient. The Jacobian is 



dxi(qi,t) 
dxi{qj, t) 

9x2(qk,t) 
dxi(qj,t) 



dxi(qi,t) 
dx2{qi. t) 

dx2(qk,t) 
9x2{qi,t) 



qk-.qi^ Q2 



In order to study the Jacobian of our replicator dynamics, 
we need to complete the definition of gg(xi(i)). Indeed, we 
observe that some components of gq(xi(i)) are left arbi- 
trary by Algorithm 1 . Exactly, some q" that are related to q' 
with Xi{q', t) = 0. While it is not necessary to assign values 
to such components during the evolution of the replicator 
dynamics, it is necessary when we study the Jacobian. The 
rationale follows. If Xi{q',t) = 0, then it will remain zero 
even after t. Instead, if, after the dynamics converged to a 
point, such a point has Xi{q') = for some q', it might 
be the case that along the dynamics it holds Xi{q') + 0. 
Thus, in order to define these components of gq(xi(i)), 
we need to reason backward, assigning the values that they 
would have in the case such sequence would be played with 
a probability that goes to zero. In absence of degeneracy. 
Algorithm 2 addresses this issue assigning a value of "1" 
to a sequence q" if it is the (unique, the game being non- 
degenerate) best response among the sequences extending 
q' and "0" otherwise, because at the convergence the agents 
play only the best response sequences. Notice that, in this 
case, gq(xi(i),x_i(<)) depends on both agents' strategies. 

Algorithm 2 generate_gq (x^ (i) , x_i (i) ) 

1: g,(x.(t),x-i(t)) = o 

2: for q e Qi s.t. q ^ qAo 

3: g,(?',Xi(t),x_i(t)) = l 

4: for q" e Qi s.t. q" n q = q and q' = q\a\ . . . : a G p(h), q -^ h do 

5: itxi(q',t) * then 

6: g,(,",x,(t),X_,(0) = ^^ 

7: elseifg" = argmax^»,„(^»),p(^)E[t/,(?%x_,)] then 
8: g,(q",Xi(t),x_i(t)) = l 

9: return Sg(x,(t),x-i(t)) 

Given the above complete definition of gg, we can ob- 
serve that all the components of gq(xi(t),x_i(t)) gener- 
ated by Algorithm 2 are differentiable, being "0" or "1" or 

" /^TTT "■ Therefore, we can derive the Jacobian as: 

Xi{q',t) 

'(g,.(xi(t),X2(t))-Xi(t))^-f7l-X2(t)+Xl(9,,t)- 

7 ag,.(^i(«),x2(t)) \^ 

\ dxi(qj,t) 'j 

(t)) 



dxi{qi,t) 
dxi{qj,t) 



■[/l-X2(t) 



xiiqt.t) ■ 



7 ag,^(xi(t),x2(t)) y 

\ dxi(qj,t) 'j 



■(7l-X2(t) 



if I * j 



dxx{qi,t) 
dx2{qi,t) 
dx-2(qk,t) 



Xl{q^,t) ■ [(g,.(xi(i):X2(t)) -XiCt))"^ -C/i -e,] 
= X2(qfc, t) ■ [ej ■ (72 • (g, Jx2(t), xi(t)) - X2(t))] 



dxi{qj,t) 

" Xi(t)^-C/2-(g5^. (X2(t),xi(t))- X2(t)) + 1-2(9* :«)■ 



dx2{qk-t) 
dx2{qi,t) 



xi(t) ■t/2-l- 



,(x2(t),Xi(t)) 



92:2(131,*) 



iik = l 



X2(qk:t) 



xi(t) ■U2 ■ 



'Sg (X2(t),xi(t)) 



dx2(qi,t) 

With degenerate games, given a opponent's strategy pro- 
file yi-i{t) and a sequence q & Qi such that Xi{q,t) = 0, we 
can have multiple best responses. Consider, e.g., the game 
in Example 121 with xf(i) = [ 1 10 0], 

^lit) = [ 1 1 ] and compute gRiL3(xi(t),X2(i)): 
both sequences R1L2 and R1R2 are best responses to X2(t). 



ifktl 



Reasoning backward, we have different vectors gg(xi, x_i) 
for different dynamics. More precisely, we can partition the 
strategy space around (xi,x_,;), associating a different best 
response with a different subspace and therefore with a dif- 
ferent gq(xi,x_i). Thus, in principle, in order to study the 
stability of a strategy profile, we would need to compute and 
analyze all the (potentially combinatory) Jacobians. How- 
ever, we can show that all these Jacobians are the same and 
therefore, even in the degenerate case, we can safely study 
the Jacobian by using a gg(x,;,x_i) as generated by Algo- 
rithm 2 except, if there are multiple best responses. Step 7-8 
assign "1" only to one, randomly chosen, best response. 

Theorem 13 Given 

• a specific sequence q ^ Qi such that Xi{q, t) = 0, 

• a sequence-form strategy ii.-i{t), 

• a sequence q' £ q, 

• the number of sequences q" such that q" n q = q' and 
q" = q'\a\ . ..." a € p(h),q -/> h and that are best responses 
to x_i(i) is larger than one, 

the eigenvalues of the Jacobian are independent from which 
sequence q" is choosen as best-response. 

Conclusions and future works 

In this paper we developed efficient evolutionary game the- 
ory techniques to deal with extensive-form games. We de- 
signed, to the best of our knowledge, the first replicator dy- 
namics applicable with the sequence form of an extensive- 
form game, allowing an exponential reduction of time and 
space w.rt. the standard (normal-form) replicator dynam- 
ics. Our replicator dynamics is realization equivalent w.r.t. 
the standard one and therefore these two replicator dynam- 
ics evolve in the same way. We show the equivalence for 
both the discrete and continuous time cases. Finally, we dis- 
cuss how standard tools from dynamical systems for the 
study of the stability of strategies can be adopted with our 
continuous-time replicator dynamics. 

In future, we intend to explore the following problems: 
extending the results on multi-agent learning when se- 
quence form is adopted taking into account also Nash re- 
finements for extensive-form games (we recall, while this 
is possible with sequence form, it is not with the normal 
form); extending our results to other forms of dynamics, e.g., 
best response dynamics, imitation dynamics, smoothed best 
replies, the Brown-von Neumann-Nash dynamics; compar- 
ing the expressivity and the effectiveness of replicator dy- 
namics when applied to the three representation forms. 



Appendix 

Relation between 

normal-form/behavioral/sequence-form 

strategies 

We briefly review how realization equivalent strategies can 
be derived according to (vS96 1. 

Given a behavioral strategy cr,, we can derive the (realiza- 
tion) equivalent normal-form strategy and sequence-form 
strategy as follow 



^i{p) = n o'iC") 



Vp e Pi (22) 

Vg e Q, (23) 



Given a normal-form strategy tt^, we can derive the (real- 
ization) equivalent behavioral strategy: 



p^P'.aep 



(24) 



Given a normal-form strategy tt^ in reduced normal form, 
we can derive the (realization) equivalent sequence-form 

strategy: 



peP:a^p 



(25) 



We denote by q{a) the sequence whose last action is a. We 
state the following lemma that we use to prove a main result. 

Lemma 14 Given: 

• a normal-form strategy tt^ in reduced normal form, 

• its equivalent behavioral strategy (Ji, 

• a subset of actions {ai, . . . , am} £ Ai, 
it holds 



^(V)= n "^U") 



(26) 



Proof. Suppose that p = ai , . . . , a„ . By (l22l l we know that 

•^i(p) = '^i(ai) <Ti(an) (27) 

For all plan of actions p e P where {ai, . . . , a„i} e P, given 
an actiona a such that a i {ai, . . . ^am}, we can have two 
possibilities 

1 . a € UTli <z(aj ), in this case the action a is present in every 
plan of actions p, being always present {ai, . . . , am}', thus 

peP:a j^ , . . . .aj-n^p 

cri(a)-Y[ai(aj)- Y, i'^iiam+2) Tiia„)) 

J = l psP-.ai am^P 

2. a i UTii qi^j), in this case there is a subset P' £ P such 
that there is exactly a p e P' for each action a' e p{h), 
where a e p{h). 

By definition of behavioral strategy we know that 






yhe H 



Thus 



JlcTiia-i)- Y, (CTi(a,„+2) cri(a„))- ^ (^iCa) = 

J=l peP;a;^ a„iep aep(h) 

nCTi(aj)- E (CTi(a,„+2) CTi(a„)) 

J = l peP;ai am'^P 

Thus, we can write 

Z ^i(p) = E cri(ai) (Ti(a„) 

p£j^:a]^ ,...,a77iep pe J^;a]^ , . . . .aj-n&p 

where, by Point 1, we know that all the actions a that are in 
the path of some ai , . . . , am, a s Ujii q{o-j), are present in 
every plan of actions 



p^P-.a^ ,...,aTTj,ep 

n 'Ti(a)- E CTi(a,„+fc) cri(a„) 

a'iUj'l;^ ?(aj) p<iP:ai,-.-,a„-,ep 

and, by Point 2, the other actions sum to "1" 

E ^-(p)= n ^.('') 

peJ=;ai,...,a„ep n<iU™jg(aj) 

This completes the proof of the lemma. n 

Proof of Theorem [13] 

Proof For each sequence q" that is a best-response to 
x_i(t), we can have different vectors gg(xi(t),x_i(t)). 
Suppose to take two different vectors gq(xi(i), x_i(i)) and 
g' (xi(i),x_i(t)). To prove the equality of the two Jaco- 
bians we have to prove that each term is the same. All the 
terms multiplied by Xi{q,t) = can be discarded, they be- 
ing equal to zero. For this reason the only term different from 
in the Jacobian is t#4^M-, thus we have to prove 

dxi(q,t)' r 

(g,(x.(t),x_,(t))-x,(t))^-C/i-x_.(t) = 

(g;(xi(t),x_.(t))-Xi(t))^.C/, ■x_i(t) (28) 



We can rewrite the equality (I28t as 

g;;^(x.(t),x_,(t)) ■ [/. -x-Ko = g';^(x.(t),x_,(t)) -c/. ■x_,(t) 

that always holds because gg(xi(t),x_i(i)) and 
g' (xi(i),x_i(t)), even if they differ for some compo- 
nents, provide the same expected utility by definition of best 
response. Even if an agent randomizes over multiple best 
responses, the theorem holds for the same reason. n 
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